*v7

*this file creates an adjustment multiplier for the spillovers. (outsample/insample patents)
*Our countrystocks (that get distributed as spillovers back to the firms)
*are essentially all patents "assigned" to a country through unique inventor 
*countries in applications for a patent family.
*But we do not simulate all firms/patents that we know about.
*So, for every year-country combination we need to adjust our 
*patent count and add patents generated by out-of-sample firms. 
*We know the ratio of insample to outsample patents in reality and adjust by that.

*But we cannot assume that firms just keep patenting the same 
*as always when our simulation changes the course for our insample 
*firms. Therefore, we assume (!) that the out of sample firms 
*behave, on average, the same as the in-sample firms.
* I.e. if we assign threetimes as many patents as countrystock 
*to a country in a year from our insample firms, we assume 
*the same happens for the out of sample firms.


* ----------------------------------- *
* Prepare division-year-country counts
* ----------------------------------- *


* prepare full_sample_count (in a balanced country-year panel)
*load our full sample count data used for countrystocks in the main analysis
use invt year nb_${ttt} using ${d}datasets/spillovers/ctry_inventions_count_pauto95.dta, clear
gen count_full = nb_${ttt}
drop nb_${ttt}
gen sdivision = "pauto95"
expand 2, gen(new)
mmerge invt year using ${d}datasets/spillovers/ctry_inventions_count_auto95.dta, unmatched(both) ukeep(nb_${ttt})
replace new = 1 if _m == 2
replace count_full = nb_${ttt} if new == 1 
replace sdivision = "auto95" if new == 1
drop nb_${ttt} _m new
encode sdivision, gen(division)
drop sdivision

*clean this some 
keep if year >= 1995 & year <= 2011
drop if division == .
ren invt_country ctry
sort ctry year division
egen ctrydiv = group(ctry division)

*create a lookup table for the ctry-division combinations so we know where we have data
preserve
    duplicates drop ctrydiv, force
    keep ctrydiv ctry division
    tempfile ctrydiv_lookup
    save `ctrydiv_lookup'
restore

*complete the panel to balance
drop ctry division
xtset ctrydiv year
tsfill, full
mmerge ctrydiv using `ctrydiv_lookup'
replace count_full = 0 if missing(count_full)
drop _m ctrydiv
sort ctry year division
tempfile data
save `data', replace

* ----------------------------------- *
* Sum up firm-weighted sample counts
* ----------------------------------- *

* prepare bvd-ctry-year-div data and merge weights
use BvD ctry using ${d}datasets/macrosim/bvd_ctry_year_inventor_weights${iwvers}_long.dta, clear
duplicates drop

*load the distributing weights
mmerge BvD ctry using ${d}datasets/macrosim/bvd_ctry_inventor_weightspre_long.dta, unmatched(both) ukeep(ctry BvD)
drop _m
bys BvD: gen ctryid = _n
tempfile bvdctrys
save `bvdctrys', replace

* sample weight, collapse to ctry-div-year level
use depvar BvD year division firm_division using ${d}datasets/macrosim/BvD_year_div_${ln_vers}_pauto95_${chosen_spec}.dta, clear

*create a list of firm-technology combinations for which we have both types of weights (assigning and distributing spillover weights)
*NOt necessarily all the time. when we have no weights, 
preserve
    keep BvD year firm_division
    mmerge BvD using `bvdctrys', unmatched(master) 
    assert _m == 3
    keep firm_division year ctry ctryid
    save ${d}datasets/macrosim/BvD_ctry_year_div_list_${ln_vers}_pauto95_${iwvers}.dta, replace
restore

mmerge firm_division year using ${d}datasets/macrosim/BvD_ctry_year_div_list_${ln_vers}_pauto95_${iwvers}, unmatched(master)
mmerge BvD year ctry division using ${d}datasets/macrosim/bvd_ctry_year_inventor_weights${iwvers}_long.dta

*when a firm-division combination receives no spillovers in a year in reality, we set the weight to 0
replace weight = 0 if _m == 1
drop _m

* sum up across firm-weighted patent counts to country-year level
gen dep_weighted = depvar * weight


*collect at the ctry-year-div level (mutliplier is at that level)
collapse (sum) count_wt_sample = dep_weighted, by(ctry year division)
sort ctry year division

* ----------------------------------------------- *
* Merge with full data count, calculate multiplier
* ----------------------------------------------- *

* merge the full data patents counts, calculate the multiplier
mmerge ctry division year using `data', unmatched(both)
replace count_wt_sample = 0 if _m == 2 
drop _m

*the actual multiplier creation. 120 lines of coed to get to the actual thing :/
gen outsample_mult =  count_full / count_wt_sample


* merge the original stock variable (for the 1995 start)
mmerge ctry year using ${d}datasets/spillovers/ctry_inventions_stocks.dta, unmatched(master) umatch(invt_country year) ukeep(k${ttt}_pauto95 k${ttt}_auto95 ${ttt}_pauto95 ${ttt}_auto95 )

*generate variables to match correct format for simulation
gen data_ctrystock = k${ttt}_auto95 if division == 1
replace data_ctrystock = k${ttt}_pauto95 if division == 2
drop k${ttt}_auto95 k${ttt}_pauto95 ${ttt}_auto95 ${ttt}_pauto95 _m 
sort ctry year

save ${d}datasets/macrosim/ctry_year_div_outsamplemultipliers_${ln_vers}pauto95_${chosen_spec}${FE}_${iwvers}${osmtrim}.dta, replace

